Indian Ocean
Development of the user-friendly decision aid Rule-based Evaluation and Support Tool (REST) for optimizing the resources of an information extraction task
Bazin, Guillaume, Tannier, Xavier, Adda, Fanny, Cohen, Ariel, Redjdal, Akram, Kempf, Emmanuelle
Rules could be an information extraction (IE) default option, compared to ML and LLMs in terms of sustainability, transferability, interpretability, and development burden. We suggest a sustainable and combined use of rules and ML as an IE method. Our approach starts with an exhaustive expert manual highlighting in a single working session of a representative subset of the data corpus. We developed and validated the feasibility and the performance metrics of the REST decision tool to help the annotator choose between rules as a by default option and ML for each entity of an IE task. REST makes the annotator visualize the characteristics of each entity formalization in the free texts and the expected rule development feasibility and IE performance metrics. ML is considered as a backup IE option and manual annotation for training is therefore minimized. The external validity of REST on a 12-entity use case showed good reproducibility.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > France (0.04)
- Africa > South Africa > Indian Ocean (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.69)
A national longitudinal dataset of skills taught in U.S. higher education curricula
Sabet, Alireza Javadian, Bana, Sarah H., Yu, Renzhe, Frank, Morgan R.
Higher education plays a critical role in driving an innovative economy by equipping students with knowledge and skills demanded by the workforce. While researchers and practitioners have developed data systems to track detailed occupational skills, such as those established by the U.S. Department of Labor (DOL), much less effort has been made to document skill development in higher education at a similar granularity. Here, we fill this gap by presenting a longitudinal dataset of skills inferred from over three million course syllabi taught at nearly three thousand U.S. higher education institutions. To construct this dataset, we apply natural language processing to extract from course descriptions detailed workplace activities (DWAs) used by the DOL to describe occupations. We then aggregate these DWAs to create skill profiles for institutions and academic majors. Our dataset offers a large-scale representation of college-educated workers and their role in the economy. To showcase the utility of this dataset, we use it to 1) compare the similarity of skills taught and skills in the workforce according to the US Bureau of Labor Statistics, 2) estimate gender differences in acquired skills based on enrollment data, 3) depict temporal trends in the skills taught in social science curricula, and 4) connect college majors' skill distinctiveness to salary differences of graduates. Overall, this dataset can enable new research on the source of skills in the context of workforce development and provide actionable insights for shaping the future of higher education to meet evolving labor demands especially in the face of new technologies.
- North America > United States > Idaho (0.14)
- North America > United States > Wyoming (0.14)
- North America > United States > Utah (0.14)
- (58 more...)
- Research Report > New Finding (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Education > Educational Setting > Higher Education (1.00)
- Education > Curriculum (1.00)
MUDGUARD: Taming Malicious Majorities in Federated Learning using Privacy-Preserving Byzantine-Robust Clustering
Wang, Rui, Wang, Xingkai, Chen, Huanhuan, Decouchant, Jérémie, Picek, Stjepan, Laoutaris, Nikolaos, Liang, Kaitai
Byzantine-robust Federated Learning (FL) aims to counter malicious clients and train an accurate global model while maintaining an extremely low attack success rate. Most existing systems, however, are only robust when most of the clients are honest. FLTrust (NDSS '21) and Zeno++ (ICML '20) do not make such an honest majority assumption but can only be applied to scenarios where the server is provided with an auxiliary dataset used to filter malicious updates. FLAME (USENIX '22) and EIFFeL (CCS '22) maintain the semi-honest majority assumption to guarantee robustness and the confidentiality of updates. It is therefore currently impossible to ensure Byzantine robustness and confidentiality of updates without assuming a semi-honest majority. To tackle this problem, we propose a novel Byzantine-robust and privacy-preserving FL system, called MUDGUARD, that can operate under malicious minority \emph{or majority} in both the server and client sides. Based on DBSCAN, we design a new method for extracting features from model updates via pairwise adjusted cosine similarity to boost the accuracy of the resulting clustering. To thwart attacks from a malicious majority, we develop a method called \textit{Model Segmentation}, that aggregates together only the updates from within a cluster, sending the corresponding model only to the clients of the corresponding cluster. The fundamental idea is that even if malicious clients are in their majority, their poisoned updates cannot harm benign clients if they are confined only within the malicious cluster. We also leverage multiple cryptographic tools to conduct clustering without sacrificing training correctness and updates confidentiality. We present a detailed security proof and empirical evaluation along with a convergence analysis for MUDGUARD.
- Africa > Cameroon > Gulf of Guinea (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Africa > South Africa > Indian Ocean (0.04)
Redundancy parameterization and inverse kinematics of 7-DOF revolute manipulators
Elias, Alexander J., Wen, John T.
Seven degree-of-freedom (DOF) robot arms have one redundant DOF which does not change the translational or rotational motion of the end effector. The redundant DOF offers greater manipulability of the arm configuration to avoid obstacles and steer away from singularities, but it must be parameterized to fully specify the joint angles for a given end effector pose. For 7-DOF revolute (7R) manipulators, we introduce a new concept of generalized shoulder-elbow-wrist (SEW) angle, a generalization of the conventional SEW angle but with an arbitrary choice of the reference direction function. The SEW angle is easy for human operators to visualize as a rotation of the elbow about the line from the shoulder to the wrist and has been used in the teleoperation of space robot arms. Since the conventional SEW angle formulation is prone to singularities, we introduce a special choice of the reference direction function called the stereographic SEW angle which has a singularity in only one direction in the workspace. We prove that such a singularity is unavoidable for any parameterization. We also include expressions for the SEW angle Jacobian along with singularity analysis. Finally, we provide inverse kinematics solutions for most known 7R manipulators using the general SEW angle and the subproblem decomposition method. These solutions are often closed-form but may sometimes involve a 1D or 2D search. Inverse kinematics solutions, examples, and evaluations are available in a publicly accessible repository.
- North America > United States > New York > Rensselaer County > Troy (0.04)
- North America > United States > Arizona (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Africa > South Africa > Indian Ocean (0.04)
Efficacy of MRI data harmonization in the age of machine learning. A multicenter study across 36 datasets
Marzi, Chiara, Giannelli, Marco, Barucci, Andrea, Tessa, Carlo, Mascalchi, Mario, Diciotti, Stefano
Pooling publicly-available MRI data from multiple sites allows to assemble extensive groups of subjects, increase statistical power, and promote data reuse with machine learning techniques. The harmonization of multicenter data is necessary to reduce the confounding effect associated with non-biological sources of variability in the data. However, when applied to the entire dataset before machine learning, the harmonization leads to data leakage, because information outside the training set may affect model building, and potentially falsely overestimate performance. We propose a 1) measurement of the efficacy of data harmonization; 2) harmonizer transformer, i.e., an implementation of the ComBat harmonization allowing its encapsulation among the preprocessing steps of a machine learning pipeline, avoiding data leakage. We tested these tools using brain T1-weighted MRI data from 1740 healthy subjects acquired at 36 sites. After harmonization, the site effect was removed or reduced, and we showed the data leakage effect in predicting individual age from MRI data, highlighting that introducing the harmonizer transformer into a machine learning pipeline allows for avoiding data leakage.
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
- (16 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
A Comprehensive Analysis of Acknowledgement Texts in Web of Science: a case study on four scientific domains
Analysis of acknowledgments is particularly interesting as acknowledgments may give information not only about funding, but they are also able to reveal hidden contributions to authorship and the researcher's collaboration patterns, context in which research was conducted, and specific aspects of the academic work. The focus of the present research is the analysis of a large sample of acknowledgement texts indexed in the Web of Science (WoS) Core Collection. Record types 'article' and 'review' from four different scientific domains, namely social sciences, economics, oceanography and computer science, published from 2014 to 2019 in a scientific journal in English were considered. Six types of acknowledged entities, i.e., funding agency, grant number, individuals, university, corporation and miscellaneous, were extracted from the acknowledgement texts using a Named Entity Recognition (NER) tagger and subsequently examined. A general analysis of the acknowledgement texts showed that indexing of funding information in WoS is incomplete. The analysis of the automatically extracted entities revealed differences and distinct patterns in the distribution of acknowledged entities of different types between different scientific domains. A strong association was found between acknowledged entity and scientific domain and acknowledged entity and entity type. Only negligible correlation was found between the number of citations and the number of acknowledged entities. Generally, the number of words in the acknowledgement texts positively correlates with the number of acknowledged funding organizations, universities, individuals and miscellaneous entities. At the same time, acknowledgement texts with the larger number of sentences have more acknowledged individuals and miscellaneous categories.
- Oceania > Australia (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (8 more...)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.46)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Government > Regional Government (0.68)
- Energy (0.68)
OAG-BERT: Towards A Unified Backbone Language Model For Academic Knowledge Services
Liu, Xiao, Yin, Da, Zheng, Jingnan, Zhang, Xingjian, Zhang, Peng, Yang, Hongxia, Dong, Yuxiao, Tang, Jie
Academic knowledge services have substantially facilitated the development of the science enterprise by providing a plenitude of efficient research tools. However, many applications highly depend on ad-hoc models and expensive human labeling to understand scientific contents, hindering deployments into real products. To build a unified backbone language model for different knowledge-intensive academic applications, we pre-train an academic language model OAG-BERT that integrates both the heterogeneous entity knowledge and scientific corpora in the Open Academic Graph (OAG) -- the largest public academic graph to date. In OAG-BERT, we develop strategies for pre-training text and entity data along with zero-shot inference techniques. In OAG-BERT, we develop strategies for pre-training text and entity data along with zero-shot inference techniques. Its zero-shot capability furthers the path to mitigate the need of expensive annotations. OAG-BERT has been deployed for real-world applications, such as the reviewer recommendation function for National Nature Science Foundation of China (NSFC) -- one of the largest funding agencies in China -- and paper tagging in AMiner. All codes and pre-trained models are available via the CogDL toolkit.
- Asia > China (0.45)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- (11 more...)
- Health & Medicine (1.00)
- Information Technology > Services (0.68)
The First Principles of Deep Learning and Compression
The deep learning revolution incited by the 2012 Alexnet paper has been transformative for the field of computer vision. Many problems which were severely limited using classical solutions are now seeing unprecedented success. The rapid proliferation of deep learning methods has led to a sharp increase in their use in consumer and embedded applications. One consequence of consumer and embedded applications is lossy multimedia compression which is required to engineer the efficient storage and transmission of data in these real-world scenarios. As such, there has been increased interest in a deep learning solution for multimedia compression which would allow for higher compression ratios and increased visual quality. The deep learning approach to multimedia compression, so called Learned Multimedia Compression, involves computing a compressed representation of an image or video using a deep network for the encoder and the decoder. While these techniques have enjoyed impressive academic success, their industry adoption has been essentially non-existent. Classical compression techniques like JPEG and MPEG are too entrenched in modern computing to be easily replaced. This dissertation takes an orthogonal approach and leverages deep learning to improve the compression fidelity of these classical algorithms. This allows the incredible advances in deep learning to be used for multimedia compression without threatening the ubiquity of the classical methods. The key insight of this work is that methods which are motivated by first principles, i.e., the underlying engineering decisions that were made when the compression algorithms were developed, are more effective than general methods. By encoding prior knowledge into the design of the algorithm, the flexibility, performance, and/or accuracy are improved at the cost of generality...
- South America > Peru (0.13)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- (8 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Summary/Review (0.87)
- Education (1.00)
- Information Technology (0.92)
- Leisure & Entertainment (0.92)
- (2 more...)
Dispensed Transformer Network for Unsupervised Domain Adaptation
Li, Yunxiang, Li, Jingxiong, Dan, Ruilong, Wang, Shuai, Jin, Kai, Zeng, Guodong, Wang, Jun, Pan, Xiangji, Zhang, Qianni, Zhou, Huiyu, Jin, Qun, Wang, Li, Wang, Yaqi
Accurate segmentation is a crucial step in medical image analysis and applying supervised machine learning to segment the organs or lesions has been substantiated effective. However, it is costly to perform data annotation that provides ground truth labels for training the supervised algorithms, and the high variance of data that comes from different domains tends to severely degrade system performance over cross-site or cross-modality datasets. To mitigate this problem, a novel unsupervised domain adaptation (UDA) method named dispensed Transformer network (DTNet) is introduced in this paper. Our novel DTNet contains three modules. First, a dispensed residual transformer block is designed, which realizes global attention by dispensed interleaving operation and deals with the excessive computational cost and GPU memory usage of the Transformer. Second, a multi-scale consistency regularization is proposed to alleviate the loss of details in the low-resolution output for better feature alignment. Finally, a feature ranking discriminator is introduced to automatically assign different weights to domain-gap features to lessen the feature distribution distance, reducing the performance shift of two domains. The proposed method is evaluated on large fluorescein angiography (FA) retinal nonperfusion (RNP) cross-site dataset with 676 images and a wide used cross-modality dataset from the MM-WHS challenge. Extensive results demonstrate that our proposed network achieves the best performance in comparison with several state-of-the-art techniques.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > China > Zhejiang Province > Hangzhou (0.05)
- Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
- (6 more...)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
New Hybrid Neuro-Evolutionary Algorithms for Renewable Energy and Facilities Management Problems
This Ph.D. thesis deals with the optimization of several renewable energy resources development as well as the improvement of facilities management in oceanic engineering and airports, using computational hybrid methods belonging to AI to this end. Energy is essential to our society in order to ensure a good quality of life. This means that predictions over the characteristics on which renewable energies depend are necessary, in order to know the amount of energy that will be obtained at any time. The second topic tackled in this thesis is related to the basic parameters that influence in different marine activities and airports, whose knowledge is necessary to develop a proper facilities management in these environments. Within this work, a study of the state-of-the-art Machine Learning have been performed to solve the problems associated with the topics above-mentioned, and several contributions have been proposed: One of the pillars of this work is focused on the estimation of the most important parameters in the exploitation of renewable resources. The second contribution of this thesis is related to feature selection problems. The proposed methodologies are applied to multiple problems: the prediction of $H_s$, relevant for marine energy applications and marine activities, the estimation of WPREs, undesirable variations in the electric power produced by a wind farm, the prediction of global solar radiation in areas from Spain and Australia, really important in terms of solar energy, and the prediction of low-visibility events at airports. All of these practical issues are developed with the consequent previous data analysis, normally, in terms of meteorological variables.
- North America > United States (1.00)
- Europe > Spain (0.88)
- Europe > Norway (0.45)
- (11 more...)
- Research Report > New Finding (0.92)
- Overview (0.92)
- Transportation > Infrastructure & Services > Airport (1.00)
- Energy > Renewable > Wind (1.00)
- Energy > Renewable > Solar (1.00)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
- (3 more...)